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O I Abstract. Fractals are self-similar recursive structures that have been 

.^^ used in modeling several real world processes. In this work we study how 

"fractal-like" processes arise in a prediction game where an adversary is 

O^ generating a sequence of bits and an algorithm is trying to predict them. 

We will see that under a certain formalization of the predictive payoff for 
the algorithm it is most optimal for the adversary to produce a fractal- 
like sequence to minimize the algorithm's ability to predict. Indeed it 
has been suggested before that financial markets exhibit a fractal-like 
behavior [112) . We prove that a fractal-like distribution arises naturally 
tZ3 out of an optimization from the adversary's perspective. 

I ^1 In addition, we give optimal trade-offs between predictability and ex- 

pected deviation (i.e. sum of bits) for our formalization of predictive 

T-H payoff. This result is motivated by the observation that several time 

^ series data exhibit higher deviations than expected for a completely ran- 

^^D dom walk. 

in 

^ 1 Introduction 

^^ Consider an adversary who is producing a sequence of bits (each bit is -1-1 or —1) 

^^ and an algorithm having seen a certain number of bits is interested in predicting 

. . the next x bits. Say the algorithm gets a payoff of 1 for every bit that it predicts 

^ correctly and —1 for every bit where it is wrong. This is like an idealized stock 

^v> market where each day the price changes by -1-1 or —1 percent and the algo- 

^ rithm is required to make a bet on the daily direction. We ask what is the most 

^ adversarial distribution on sequence of bits so as to minimize the algorithm's 

payoff. Clearly the uniform distribution where every bit is chosen independently 
and uniformly at random is the most adversarial, since the expected payoff of 
any algorithm is always exactly 0. 

Given a sequence s of bits, let h{s) be the sum of the bits in s i.e. the height 
of the sequence when plotted cumulatively. We will refer to the magnitude of 
height as deviation. For s € {—1,1}^ chosen uniformly at random the typical 
deviation s is 0{Vt). 

The question we study here is: what is the most adversarial distribution on 
sequences if the distribution is required to be heavy-tailed, say the typical height 



should be k^/T where fc > 1. Indeed it has been observed in several studies that 
the distribution of financial time series is heavy-tailed |3l4j . A natural heavy- 
tailed distribution is to pick a random string conditioned on its height being at 
least fcyT- This is essentially the highest entropy distribution with the property 
that the typical height is around k^/T. However the highest entropy distribution 
is not the least predictable. Indeed for large fc, it tends to rise/drop rather linearly 
to its final height. Thus by observing the initial segment of bits, the algorithm 
can easily infer the direction of the remaining bits to get a large payoff. 

One distribution that has been suggested for financial markets is the Fractional 
Brownian Motion (FBM) \b 6j which is a generalization of the Brownian motion. 
For our purposes, the Brownian motion can be thought of as a continuous variant 
of the uniform distribution on bits. FBM is characterized by a single parameter H 
which is called the Hurst parameter, and the typical height achieved by sequences 
drawn from FBM(i/) is around T^ . For H > 1/2, the increments of FBM are 
positively correlated while the case H = 1/2 corresponds to Brownian motion. 

To make our question precise we introduce a measure of unpredictability for 
a distribution which is motivated by the notion that the expected payoff of an 
algorithm on an interval / having observed the previous bits should be small 
compared to the standard deviation of height in /. Intuitively, we are enforcing 
a low signal-to-noise ratio. 



Definition 1 Let D be a distribution which produces bits in an online fashion 
and s be the sequence of bits that have been produced immediately preceding an 
interval I. LetEi[Ag{I)] denote the expected payoff of an algorithm A on interval 
I (where the bits in I are produced according to D conditioned on having produced 
s immediately before I). Note that A must fix its prediction for I based solely on 
s and before looking at any bits within I . 

We say that D is 5 -unpredictable if for all A, s and I, E,[As[I)\ < 6 ■ \/\I\. 
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Fig. 1: Growth charts for two different types of adversarial sequences. The first 
is the cumulative plot of a random i.i.d sequence with a constant upward bias. 
The second is an a-inverting sequence as in Definition [2J Note that the latter 
plots seems to change direction more significantly than the former. 



For example an algorithm may notice a high density of +l's and may de- 
cide to predict +1 for the next few bits (this would correspond to a "buying" a 
stock) for the next x bits. Note that y/x is the standard deviation in the payoff 
of an algorithm for the uniform distribution on x bit sequences and thus we 
are asking that the payoff of the algorithm for a (5-unpredictable distribution is 
negligible compared to this standard deviation (we will in fact construct distri- 
butions where the standard deviation is much higher than ^/x). Roughly, this is 
equivalent to saying that the signal to noise ratio in any interval is negligible. 

We ask what is the maximum deviation that can be achieved by a (5-unpredictable 
distribution D. We will look at maximizing measures such as median deviation 
or mean deviation: Es^£)[|/i(s)|] (we will show that our claims hold with respect 
to any of these measures). 

We show that there is a (5-unpredictable distribution which achieves a deviation 
of VT{1 + f2{SlogT)). Thus, the deviation can be uj{VT) for S = o(l). The 
distribution we construct is a variant of a discretization of FBM. We also show 
that the highest deviation that can be achieved by a (5-unpredictable distribution 
is \/r(l -f 0((51ogT)). In addition, we construct a distribution which is a simple 
discretization of FBM and show that the deviation achieved by this distribution 
is yi/2+iS'('5)_ Though this distribution is not ^-unpredictable, it satisfies a similar 
but weaker property. 

A nice property of (5-unpredictable distributions is that they are "fractal- like" 
in some sense. We use the terms fractal-like and fractal somewhat interchange- 
ably. Normally fractal is considered to be a self-similar recursive structure in 
Euclidean space (usually with non- integer dimension to exclude trivial patterns). 
Traditionally this has not been applied to bit sequences. Therefore we refrain 
from calling such sequences strictly a fractal. To formalize our "fractal-like" 
property, we first define a notion of inversion for a deterministic sequence. The 
property essentially says that if in any interval there is a huge rise, then there 
must be a sub-interval where there must be a proportionally big fall and vice 
versa. 

Definition 2 (ck-Inversion) Given a sequence s E {—1, 1}"^, it is said to be a- 
inverting if for every interval X within [1,T] (of at least some constant length) 
there is a sub interval Y such that h{sx) and h^sy) are of opposite sign and 
\^isY)\/\h{sx)\ > OL. Here by sj we mean the sequence s restricted to interval I . 
We refer to the largest feasible a as the inversion ratio of s. 

Observe that an a-inverting sequence resembles a fractal in a certain sense. 
To see this, note that in a sequence s such that h{s) > 0, if we locate the biggest 
contiguous rise, it may be divided into three parts S1S2S3 where S2 has a net 
downward slope and si, S3 have a positive slope each. But one can recurse and 
divide each of the three substrings further into three parts each and thus the 
sequence has a recursive, self-similar structure. 

We show that any (5-unpredictable distribution is a-inverting in a certain sense. 
Since we are dealing with a distribution rather than a deterministic sequence we 
need an appropriate generalization of Definition [2] which is stated in Section 



1.1[ It will be clear from the definition that the highest entropy sequence we 
discussed earlier has a very small inversion ratio compared to (5-unpredictable 
distributions. 



1.1 Main results 

In this section we describe our main results in more detail. As we mentioned ear- 
lier, the adversarial distributions we construct are closely related to and inspired 
from FBM. 

FBM with parameter H is the unique continuous time, Gaussian process Bnit) 
which satisfies B{0) = 0, E[Bff{t)] = for all t and has covariance function: 

E[BHit)BH{s)] = Im^" + \s\^" -\t- s\'") 

The process B^ is translation invariant and is self-similar in the sense that 
{Bniat) : t e K} is identical in distribution to {a^Bnit) : t e K} for aU a > 0. 
Furthermore, Bnit) is normally distributed with variance t^ . Thus any interval 
of length t has deviation about t^ . The case H — 0.5 corresponds to the standard 
Brownian motion. 

The analysis of the FBM usually requires an understanding of integrated 
Wiener processes. The first adversarial distribution we construct is a discrete 
variant of the FBM that produces bits instead of real numbers. We denote this 
distribution as Fractal Random Walk (FRW). 

The sequence is constructed recursively in lengths that are powers of 2. To 
produce a sequence of length 2n, we concatenate two recursively constructed 
sequences of length n each, and change the height of the second sequence by a 
factor proportional to the height h of the first sequence. This is done by flipping 
approximately 5h (— l)'s to -|-l's if /i > (and -|-l's to —1 otherwise.) A formal 
description of the construction appears in Section [3J 

While this lacks the translation invariance and the exact self-similarity proper- 
ties of the FBM, it still has the property that any interval of size t has deviation 

^1/2+0(5)^ 

To see this, note that if hi,h2 denote the heights of the two sequences that 
are concatenated to produce the sequence of length 2n after altering the second 
string then E[hih2\ = 2<5E[/if] = 26E[H{n)'^] where H{n) is a random variable 
that denotes the height of a random sequence of length n drawn from FRW. So 
E[iJ(2n)2] = E[{hi + /i2)^] = E[hl] + E[hl] + 2E[hih2] = {2 + d)E[H{n)% The 
recurrence works out to a root mean square deviation {y/E[H{ri)^) of about 

^1/2+0(5)^ 

This informal description skips over technical issues such as discretization. Fur- 
thermore, extending this argument to show that the high deviation is achieved 
with constant probability is more complicated and is done in Theorem [7] Note 
that a constant probability bound for achieving a particular deviation is stronger 
than showing a high deviation in expectation (using Markov's inequality). We 
note that this distribution is not (5-unpredictable but satisfies a weaker property 



(Theorem 20 1. For completeness, we show that the FBM (continuous version) 



with H = 1/2 + (5 is also not (5-unpredictable in the strict sense (Claim [PJ ). 
We also note that the highest entropy distribution is very poor in terms of 5- 



unpredictability (Claim 25 1 



We construct another distribution, which we call Optimal Fractal Ran- 
dom Walk (Opt-FRW) which has optimal trade-offs between deviation and 
predictability. The distribution Opt-FRW is a simple but important twist on 
the above process where instead of flipping 5 ■ h bits, we flip 5 ■ ^/n bits in the 
direction of h. 



Theorem 3. (Theorems \1^ \T8\ and\T^ The distribution Opt-FRW is 0{S)- 
unpredictable and achieves a deviation of \/T{l + f2 {5 log T)) with constant prob- 
ability. Further, no S -unpredictable distribution can achieve an expected deviation 
higher than Vt{1 + 0(5 log T)). 

We now turn to formalizing the relationship between (5-unpredictability and 
"fractal-like" property of a distribution. 

For a deterministic sequence we show that an a-inverting sequence with the 
highest deviation is a fractal. 

Theorem 4. (Claim [C]) 

Let s be an a-inverting sequence of length t (Definition]^, where a is bounded 
above by a constant. Then the highest deviation that can be achieved by s for 
large t is t^ where 6 is the solution to the equation 1 = 2((1 -|- a)/2)^'^ -I- a^'^ . 
Furthermore, this deviation is actually achieved by an appropriately designed 
fractal. 

For distributions D over sequences we define the following variant of the earlier 
inversion rule. 

Definition 5 ((a, (7)-Inversion) A distribution D is said to be (a, q) -inverting 
if for any interval X of at least some constant length) with median deviation 
A = f2{Sy/\X\), with probability at least q there is a sub interval Y such that 
h{sx) and h{sY) are of opposite sign and |/i(sy)| > a ■ A. Here by sj we mean 
the sequence s restricted to interval /. This should hold even if one conditions 
on a given history of bits seen before the interval X . 



We note (see Observation 24 1 that a uniform random sequence is {a, q) invert- 
ing for some constants a, q. Further the probability parameter q can be made as 
high as 1 — e by reducing the inversion ratio a to 6'(l/log(l/£)). 

The following theorem establishes that every i5-unpredictable distribution must 
be fractal-like in the sense that it is (^(1), ^i^)) inverting. 



Theorem 6. (Theorems 16 and | J7[ ) For S small enough, any S -unpredictable 
distribution is also (a, q) -inverting for some constants a,q. Further by dropping 
the inversion ratio a to 0(1/ log T) the probability q can be made as high as 
1 — l/r^^(i) for all intervals of length at least i7(logT). Thus the condition holds 
with high probability simultaneously for all such intervals. 



1.2 Related Work 

Many studies support the thesis that fractals occur naturally in several real world 
processes in diverse fields such as physics, finance and geography [7J819J. Ralph 
Elliot [T], a professional accountant, suggested the use of fractal like "waves" 
in understanding financial markets. Fractal models for finance have also been 
studied widely in the academic community. Fractional Brownian Motion (FBM) 
was introduced as a variant to the well known Brownian Motion by Mandelbrot 
and van Ness in [S]. In addition to financial time series modeling, FBM has also 
found applications in the study of network traffic and fluid turbulence |10l6j . 

The reason for considering FBM rather than the standard Brownian motion 
for flnancial modelling was the observation that the distribution of financial 
time series is heavy-tailed j3|4j. This means that the deviations achieved are a 
bit higher than those expected for Brownian motion. It has been argued that 
modeling S&P500 price data according to FBM produces an estimated value 
of the Hurst parameter H to be slightly over the 0.5 value that corresponds 
to the standard Brownian Motion ^T]. Values oi H > 0.5 allow for long range 
(positive) correlations in the time series that results in a higher than normal 
deviation. Besides FBM other models such as p-stable distributions and levy 
distributions J12I4I13J provide an alternate explanation for the heavy tailed na- 
ture of time series data by allowing heavier tails for the price changes in each 
unit time that are independent across time. In contrast, the FBM uses normally 
distributed price changes in each unit time, and the high deviations are achieved 
by correlations across time. 

Works such as |14I15| have analyzed the level of arbitrage present in FBM. 
The authors in IF] have analyzed the predictability of the FBM using a different 
loss function from ours. Other researchers |17I18| have studied the prediction 
problem as a game between an algorithm and an adversary, and derived that 
the optimal strategy for the adversary resembles a Brownian Motion. The work 
in p~8j was inspired by [19^ where the authors provide robust upper and lower 
bounds for pricing European call options, under the no-arbitrage assumption 
when the price process is assumed to be discrete and discontinuous as opposed 
to the Black Scholes model ^ where the price process is taken to be continuous. 

1.3 Discussion and Future work 

Note that our notion of (5-unpredictable requires the algorithm to fix its predic- 
tion for an entire interval / before looking at any of its bits. A stronger notion of 
unpredictability is to allow the algorithm to change its prediction for the interval 
after looking at bits within I. In other words, at every point the algorithm tries 
to simply predict the next bit, based on the bits it has seen so far. One could 
ask what is most adversarial distribution in this setting which achieves a high 
deviation. In this setting, for any sequence s, a bounded regret algorithm such 
as Weighted Majority can achieve a payoff of \h{s)\ — c\/|s| where c :— -v/S/V 
[21122] . So for a distribution D which achieves typical deviation fc-\/T, it is al- 
ways possible to get a payoff of {k — c)Vt. It is also fairly straightforward to 



construct a distribution D such that no algorithm can achieve an expected pay- 
off better than (fc — c)^/T even when it predicts one bit at a time. We also note 
that while the distributions inspired by FBM have some guarantees in terms 
of (5-unpredictability, they perform poorly in this model when one is allowed to 



predict based on all previous bits (see Claim 26 1 . 

One possible justification for our notion of ^-unpredictability is that changing 
predictions very often may have a cost associated with it. Although this may be 
a reasonable assumption (at least for financial markets), it is only a conjecture 
at this point and we invite further comments on this issue. 

An interesting direction for further research is to look for natural constraints 
on real world processes which provably result in the formation of fractal-like 
processes. 

2 Preliminaries 

Here is some common notation we use throughout the paper. For a sequence of 
bits s e { — 1, 1}"^, h{s) denotes the sum of bits in s i.e. the height of s. We refer 
to the magnitude of height as deviation. 

We will be working with several aggregate measures of deviation for a distri- 
bution such as median deviation (or generalized median), mean deviation and 
root-mean-squared deviation {^JWs^^yo\h{s)^) . Note that mean deviation is no 
more than root-mean-squared deviation and the generalized median is bounded 
by mean deviation up to constant factors using Markov's inequality (as long as 
the probability in generalized median is at least a constant). We will prove our 
upper bounds for root-mean-squared deviation and lower bounds for generalized 
median and so they will hold for all measures up to constants. 

We will typically denote random variables by capital letters and fixed se- 
quences by small letters. 

3 Construction of Adversarial distributions 

In this section we formally construct our adversarial distributions. Each of these 
distributions has two parameters, / which is the length of the sequence in the 
base case and J > 0. 

We will construct the distributions inductively: having constructed Ds{n) we 
will show how to construct Ds{2n) (the base case for n — I is simply a random 
sequence in { — 1,1}'). In both cases below, we describe the distribution Ds{2n) 
in terms of how to generate a sequence s ^ Ds{2n) given access to distribution 
Ds{n). 

Fractal Random Walk (FRWi,^) (2n) 

1. Generate sequences si,S2 independently according to FRW;_5(n) 

2. If height of si is positive, change exactly S-h{si) — I's in S2 to 1 (if they exist, 
otherwise change as many as possible). Similarly, if height of si is negative, 
change exactly 6 ■ h(si) I's in S2 to —1 (if they exist). Call the resulting 
sequence S2- 
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3. Set s = si ■ S2 i.e. the concatenation of si and 53 
Optimum Fractal Random Walk (OPT-FRW;,5)(2n) 

1. Generate sequences Si,S2 independently according to OPT-FRW/,5(n) 

2. If height of si is positive, change exactly S^/n — I's in S2 to 1 (if they exist, 
otherwise change as many as possible). Similarly, if height of si is negative, 
change exactly S^/n I's in S2 to —1 (if they exist). Call the resulting sequence 

3. Set s = si ■ S2 i.e. the concatenation of si and S2 

Note: Note that both distributions involve changing exactly r bits in S2 where 
r is a real number. Intuitively, we want to change each bit of the appropriate 
sign in S2 with probability r/n. However, it is simpler to analyze the deviation of 
the distributions when we change exactly r bits. The fact that r is a real number 
and not an integer will not make much difference since our base case / will be 
an increasing function of T (total number of bits to be produced) and so the 
discretization errors can be safely ignored. 

3.1 High deviation 

In this section we show that the distributions we constructed achieve high devia- 
tion with constant probability. What follows is a proof sketch for high deviation 
of distribution FRWi.5. Due to space constraints, the proof for Opt-FRW^^^ 
and for the intermediate claims appears in the appendix (Section [BJ). 

Theorem 7. The distribution FRW i_s{T) achieves a deviation of ri/2+e(<5) 
with probability at least 1/2 — e where e < T^^'^. 

Proof: To analyze the height distribution of FRW;/ it will be more convenient 
to define another process which is similar to FRW;^^ but which can assume 
integer values instead of bits. 
Augmented Fractal Random Walk (AFRW;.^) (2n) 

1. Generate sequences Si,S2 independently according to AFRW;_A-(n) 

2. If height of Si is positive, change exactly S ■ h{si) —I's in S2 to 1 (if they 
exist). Similarly, if height of si is negative, change exactly S ■ h{s2) I's in S2 
to —1 (if they exist). Call the resulting sequence 53 . 

3. Augment: If there aren't enough —I's to flip in S2, then add 2 to some of 
the numbers so that the increase in height is exactly d ■ h{si). Similarly for 
I's. 

4. Set s = si ■ S2 i.e. the concatenation of si and Sg 

For the random variable S ^ AFRW/^^, we can exactly characterize the dis- 
tribution of h{S). 



Claim 8 (Claim\2^ For n = 2'-l, S -AFRW/,5(n), 

hiS) = J2 ^^""^HXu) (3.1) 



[/cm 



where r = (1 + S) and each Xu is independently and uniformly distributed in 
{-1,1}'- 



We then apply the Berry-Esseen theorem (Theorem 28 1 to show that the 
deviation of |/i(AFRW;,5)| is high. 

Lemma 1. (Lemma\^ Median of \h{AFRWi^s)\ «s n^+"^^'> . 

Next we show that the probabihty of executing step Augment in AFRW/^^ 
is exponentially small. Note that when constructing a sequence of size T, the 
inductive steps of distribution AFRW; ,5 are executed at most 2T times. We show 
that when starting with sequences of size I where I = 100 log T, the probability 
that sequence S2 doesn't have enough I's or — I's to flip at a particular stage is 
at most T^^*^. Thus, taking a union bound over all inductive steps, we get the 
desired result. 



Claim 9 (Claim 22) The probability that step Augment is executed at a par- 
ticular step is at most T^^^. 

When the step Augment is not executed, the distributions AFRW and FRW 
are identical. Thus, the probability that the distribution FRW;^5(T) achieves a 
deviation of T^/^+^(^) is at least 1/2 - T"!". 



3.2 Unpredictability 

In this section we show that the distribution OPT-FRWi,^ is ^-unpredictable. 

We first observe that it suffices to work with aligned intervals i.e. intervals 
which start and end at appropriate powers of 2. 

Definition 10 (Aligned interval) 

We assume here that T is a power of 2. An aligned interval is one which is 
obtained by breaking [1,T] into 2* equal parts for i G [O,logr] and picking one 
of the parts. So for instance the first part is always [1, 2*]. 

In other words, an interval [p + l,p + x] given by p (E [0, T], a; G [1,T — p] is 
said to be an aligned interval if p = j ■ 2* and x — 2'' for some i € [0, logT] and 
je[0,T-T]. 

Claim 11 // distribution D(T) is e -unpredictable with respect to all aligned in- 
tervals then it is c- e-unpredictable with respect to all intervals, where c :— x"^ . 



The proof of Claim 11 is fairly straightforward and is moved to the appendix 
(Claim [23|. 



Theorem 12. The distribution Opt-FRW;^^ is O [5) -unpredictable. 

Proof: [Sketch] 

It can be shown that the process Opt-FRW; ,5 has very similar properties if 
in Step 2 of the construction, instead of changing exactly 5 ■ ^/n bits in S2 we 
change each bit (of appropriate sign) in S2 with probability -A= . Here we assume 
this fact without proving it. 

We need to show that for every A, s and /, E[As(-^)] < 0{5) ■ \/\r\ where s 
and / are as in Definition n] We may assume that / is an aligned interval (Claim 



11) 



From the construction it is clear that ]E[74s(/)] is largest when h{s) = \s\ 
or h{s) = —\s\ i.e. all the bits before / are of the same sign. Without loss 
of generality assume h^ = s. Also, if there were no prefix (i.e. \s\ = 0) then 
E[^s(^)] — since the construction is symmetric. To provide an upper bound 
on E[74s(/)] we simply need to bound the expected number of — I's which are 
changed to +l's due to the existence of s. We will use a simple union bound 
on the total probability of changing a —1 to a 1 according to the construction. 
This probability can be split into 2 parts, the first which occurs because of bit 
sequences immediately preceding / of length less than / and the second because 
of bit sequences immediately preceding / of length more than /. For sequences 
of the first kind, the number of bits changed in / is exactly S ■ \fl while for 
sequences of the second kind we may assume that the expected number of bits 
changed in / is ^M where / is the length of the bit sequence under discussion. 
Thus, the total probability is bounded by:- 



00 log|/| 00 1^1 

^(min(|/|,2').<5)/^/2^== ^ 5.^/2^ + ^ b ■ ^= 



4 = 1 J = l0g|/| + 1 



Both terms can be bounded by b ■ \/\i\ ■ X^i^o 1/ v 2* and so the combined sum 
is at most 0{b) ■ vT^- 



4 Deviation upper bound for Adversarial Distributions 

In this section we prove that the deviation achieved by Opt-FRW is essentially 
the best possible for a ^-unpredictable distribution up to a constant factor. 

Theorem 13. The highest Root- Mean- Square deviation that can be achieved by 
a b -unpredictable distribution on sequences of length T is vT(l + 0{b)) logT. 



Proof: 

Let T>s{T) be the set of all (5- unpredictable distributions over sequences of 
length T, and let /i„ = v[\&yiD^x>s{n)'^s~D[h{sY]. Clearly, hi = 1. We need to 
show that V^ = Vr(l + 0{b)) logT. 
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Let D(T) be a (5-unpredictable distribution whieh maximizes Es^£)[/i(s)^]. 
Given a sequence s ^ D, we write s — siS2 where si and S2 are of length n/2 
each. Then we have, 



K = E,^D[h{s)^] - E[(/i(si) + his2)f] 

= E[h{sif] + E[/i(s2)2] + 2E[h{si)h{s2)] 

n/2 

< 2/i„/2 + 2 ^ Pr[h{si) ^x]-x- E[h{s2) \ h(si) 

x=0 

n/2 

< 2/i„/2 + 2d^/n/2^PT[h{si) = x] ■ \x\ 

x=0 

^2h^/2 + S-V2^-E[\h{si)\] 

< 2/i„/2 + 6-V2n- v/E[/^(si)2] 

< 2/i„/2 + S ■ V2n ■ Jhn/2 



The first inequahty follows from the definition of /i„/2- The second inequality 
follows from the fact that the distribution of S2 is also ^-unpredictable. 



Let's substitute, 5^ := hn/n. Then /i„/2 = ing1j^)/2 and y^/i„/2 = ^Jnj2^ g\l2. 
Thus, we get 



2 

/2 



=^ .9« < 5«/2 + (5/2 

Since (71 = 1, this gives the upper bound (?„ < 1 + (<5/2)logn. This implies 
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A Fractal nature of Adversarial Distributions 

Here we show that any distribution which is (5-unpredictable must have a fractal 
Hke nature (Theorem pi). We wih first show that (5-unpredictable distributions 
are also unpredictable in a slightly stronger sense. 

Definition 14 (Adaptive interval algorithm) An interval prediction algo- 
rithm is said to be adaptive if it can choose to stop making predictions on in- 
terval I at any point within I based on the bits it has seen so far. Note that we 
do not allow the prediction of the algorithm to depend on the bits in I, the only 
decision the algorithm can make based on bits in I is to stop predicting earlier 
than the end point of I. 

Definition 15 (Adaptively (5-unpredictable) A distribution D is said to be 
adaptively 6 -unpredictable if for any adaptive algorithm A, sequence of bits s and 
interval I, K[As{I)] < 5 ■ yl where I is the expected time for which A continues 
making a prediction in I . 

Here the bits in I are produced according to D conditioned on having produced 
s immediately before I, similarly as in Definitions^ 

Theorem 16. A 5 -predictable distribution is also adaptively O {6) -unpredictable. 

Proof: 

Let Z? be a (5-predictable distribution and A' an adaptive interval algorithm. 
We first show that E[yl^.(/)] < 25 ■ \/\T\ i.e. we replace the expected time for 
which A' continues making a prediction in / by the maximum time for which it 
makes a prediction. 

We will construct a non-adaptive algorithm A such that £[1^5(1) — yl^.(/)|] < 
6 ■ ^/\T\. Since E[yls(/)] < S ■ y^f/f {D is (5-unpredictable) this implies that 
E[A'JI)]<2S-^\ 

Let Pu be the probability of producing a sequence of bits u as a prefix in 
/ according to distribution D. Let E be the set of sequences u such that the 
algorithm A' stops making predictions on seeing u. Then X^ueBP" ~ ^■ 

Let Pu{A) denote the expected payoff of A on the remaining part of / con- 
ditioned on the event that A' has stopped making predictions. Then Pu{A) < 
6-V\l[^H<S-^\.Thns,E[\A,{I)~A',{I)\]<j:^^j,p^-6-y^\=S-y^\. 

Now we extend the proof to the case where A' makes a prediction for expected 
time X rather than maximum time x. 

Let qi be the probability that A' makes a prediction for time more thant 2'^x. 
By Markov's inequality, qi < 2^'\ Also, qi = X]ug-E|«|=2* P"' where Pu is as 
defined above. We will bound the payoff of A' in phases where the i*'* phase 
consists of bits between 2*a; to 2*+^x from the start of /, and show that it is at 
most 2(5 • qi ■ v2h;. For a fixed sequence m, the payoff of algorithm A' in phase i 
conditioned on having seen u is at most 26\/2^x (proved above). Thus, the total 
payoff of A' in phase i is at most 2(5 • qi ■ \/2Fx. Finally, the expected payoff of 
A! over all phases is at most: 
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J2^ 2Sq,V¥^ < 2<5 • E, V¥^/2' < 0(6) ■ ^) 

which proves that D is adaptively 0((5)-unpredictable. 

■ 

Now we turn to showing that any adaptively (5-unpredictable distribution has 
a fractal like nature. 

Theorem 17. If a distribution overT bit sequences is adaptively 6 -unpredictable 
(Definition | J5[ ) then it is {a, q) -inverting for some constants a,q. Further by 
dropping the inversion ratio a to 0(1/ log T) the probability q can be made as 
high as 1 — 1/T^^^^' for all intervals of length at least n{logT). Thus the condition 
holds with high probability simultaneously for all such intervals. 

Proof: 

For a certain given history of bits consider the interval /. Let h{I) denote the 
random variable that denotes the height of this interval. Let 9,p be such that 
the deviation in / exceeds with constant probability p (this generalizes the 
case when 6 is the median deviation.) 

We will show that some prefixes of / must achieve height at least ad and —a9 
each with constant probability (where a < 1/2 is a constant). To show this, 
note that either h > 9 or h < —9 with probability at least p/2. Assume it is the 
former without loss of generality. So we only need to prove that h < —a9 with 
probability at least p/A. Assume the contrary and we will see that the interval 
cannot be (5-unpredictable. 

Consider a prediction algorithm that predicts +1 for the interval but adap- 
tively terminates its betting if the height drops to —a9 or if the height exceeds 
2a9, whichever happens first. Since the algorithm hits the lower limit of —a9 
only with probability at most p/4, so with at least probability p/A it must realize 
the upper limit (payoff) of 2a9 (since 2a < 1). In all remaining cases the payoff 
is at least —a9. So the expected payoff is at least {p/4:){2a9) — {p/4:){a9) which 



needs to be at most S^/x. This is not possible if a < 1/2 and 9 = J7(— ^^ — ). 
Thus if the height in an interval has high magnitude with constant probability, 
it must reach in either direction with constant probability. 

To convert this into a high probability argument, we will use (at most) s 
iterations of the above prediction algorithm each with limits that depend on 9/s 
instead of 9. Each iteration has limits of 2a9/s and —a9/s on the sum of bits 
seen during its execution. The next iteration is initiated only if either of the 
upper or lower limit is reached in the previous iteration and if not all |/| bits in 
the full interval are exhausted. From the previous argument, conditioned on the 
event that a certain iteration is initiated, if an iteration is executed for expected 
time 0{\I\/s) and hits the upper limit with probability p/2 then it must also 
hit the lower limit with probability p/4. Since the final height exceeds a9 with 
constant probability p, in such cases all s iterations have been initiated. Since 
there are at most s iterations and all are initiated with constant probability, 
at least half of them must have an expected length of 0{\I\/s) conditioned on 
the event that they are initiated; otherwise the total expected time of all the s 
iterations will exceed x. 
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Conditioned on the event that the i*'* iteration is initiated, with probabihty 
p it must hit at least one of its two hmits; otherwise the total height will not 
reach 2a6 with probability p. So conditioned on the event that the i*'* iteration 
is initiated, for at least half the iterations, it must hit the lower limit (and 
upper limit) with probability at least p/4. So conditioned on the event that all 
s iterations are initiated the probability that none of them hit the lower limit 
and also the upper limit is at most (p/4)^'^. 

Thus, it follows that by choosing s — 0(1), we get an a inversion for constant 
a with constant probability. This proves the first part of the theorem. 

For the second part, note that with probability at least 1 — (p/4)*/^ either 
the final height is less than 2a6 or some subinterval has height —aO/s. For 
s = 0{\ogT) the probability that the final height exceeds 6 and there is no 
inversion of height < —a/sO is negligible. ■ 



B Omitted Proofs 

Theorem 18. The distribution 0PT-FRW(,5(r) achieves a deviation of^/T{l+ 
i7{S\ogT)) with constant probability for I := T"^''^. 

Proof: 

To prove the theorem it will be more convenient to define another process 
which is similar to Opt-FRW;.^ but which can assume integer values instead of 
bits. 

Augmented Optimum Fractal Random Walk (AOPT-FRW(,5)(2n) 

1. Generate sequences si,S2 G {^lil}" independently according to AOpt- 
FRWi.sin) 

2. If height of si is positive, change exactly 6 ■ ^/n — I's in S2 to 1 (if they exist). 
Similarly, if height of si is negative, change exactly 6 ■ y'n I's in S2 to —1 (if 
they exist). Call the resulting sequence sj,- 

3. Augment: If there aren't enough (— l)'s to fiip in S2, then add 2 to some 
of the numbers so that the increase in height is exactly S ■ ^n. Similarly for 
I's. 

4. Set s = si • §2 i.e. the concatenation of s\ and s'2 

First we observe that when I — T^'^l^, the probability of executing step Aug- 
ment is exponentially small in T. To see this note that if all the base sequences 
of length I have at least ciT) := SVTlogT (— l)'s and at least c{T) I's then the 
step Augment is never called. This is because every inductive step removes at 
most SVt I's or —I's at each stage and the number of times a base sequence is 
modified is at most \og{T/l) < logT. Now note that by Chernoff bound, proba- 
bility that a given base sequence does not have c{T) I's or (— l)'s is exponentially 
small in T. Finally note that the number of base sequences is at most T/l, so 
we can simply take a union bound over all of them. 

For brevity, let D :=Opt-FRWz,5 and D' :=AOpt-FRWz^5. The next ob- 
servation is that it suffices to prove that E[|/i(I?')|] is VT{1 + n{S\ogT)) and 
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E[h{D'f] ^ 0{E[\h{D')\f) to prove the theorem. To see this, let NA be the 
event that the step Augment is never executed at any point in the construc- 
tion, then we have:- 



Es^D[\h{s)\]>E,^Dms)\ \NA] 
^E,^D'[\hm \NA] 
= Es.^D'[\Hs)\] - PiiNA] ■ max\h{s)\ 

We already saw that Pr[7V^] is exponentially small in T. Note that the maxi- 
mum value of |/i(s)| isatmostr-H(5(,/T/2 + 2v^T74) + 4A/T78 + --- + 2^/'-M 
which is bounded by a polynomial in T. Thus, if E[|/i(D')|] is y/T{l + n{5\ogT)) 
then so is E[|/i(D)|]. It is also easy to see that the maximum value of /i(s)^ is 
polynomial in T. This fact combined with our assumption about D' , E[h{D')^ = 
0{E[\h{D')\]^) implies that E[h{D)^] = 0{E[\h{D)\]^). Applying Lemma ^ to 
distribution D we get that deviation ^/T{1 + Q{S log T)) is achieved with constant 
probability as required. 

So to reiterate, we need to prove two things:- 



- E[\h{D')\] is VT(1 + f2{S\ogT)) 

- E[h{D')^] = 0{E[\h{D')\]^) 

From now on, we denote by St a random sequence S drawn from the distribu- 
tion D'{T). The random variable h{ST) can be written as h{Ax/2) + h{Brp^2) + R 
where R is Sy/T/2 if h{Arp/2) > and —6^fTj2 otherwise. Here the pairs of vari- 
ables (Ay/2i -St/2) and (-Bt/2: ^) are independent. Now define Ht := h(JsT). We 
see that. 



E\h\\ = E\h{STf\ 

= E{{h{ATl2) + KBtI2) + Rf\ 

= E{h{Ari2f\ + E{h[BTi2f\ + E[i?'] + 2E[/i^i?] 

^T/2J 

^T/2] -|-0VZJJI1,[|UT/2| 

The following claim gives a lower bound for E[|/it| 



= 2E{h%i^^8'^^JYl2^2b^jT]2E^hA\ 
> 2E[h'^,^]+SV2fE[\hj 



Claim 19 

E[/l2,]2 



E[\hT\] > 



E[4]3/4 



Proof: Let the random variables A, Y be defined as A :— |/it|^^^, Y :— |/itP^^- 
By Cauchy- Schwartz, 
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¥\XYf < 
< 

n\hT\] > 



E[X 


2] . E[Y^] 


E[\hT\] 


■E[\hT\'] 


E[\hT\] 


E[4]3/4 




E[4]2 



E[/l4,]3/4 



Thus, we can say that 



E[h^] > 2E[/i^/2] + 6V2TE[\hT/2\] 
, Efft.2 12 



First, let's complete the proof assuming that j^L^ , > C for all T where C is 
an absolute constant. Let's substitute g^ := E[h'^]/T. Then, 




E[/i|] > 2E[/i|/2] + n{S)V2TJE[hl 



T/2\ ^ ^^y^jy ^-^ y'^["'T/2 



=^ T-gl> 2 • (r/2) • g^/2 + r2(<5) • %/2T • 7772 ^52^2 

=^ 9t > 9t/2 + ^(^) ■ 9T/2 

{gT/2 + n{S)f - 0{6^) 

=^ 9T > 9T/2 + ^{S) 

For the base case, we have E[hf] = /, thus gf = 1. Thus, 

5T > 1 + n{S) ■ \og{T/l) = 1 + Q{5) ■ logT^/^ = 1 + Q{5\ogT) 
Thus, E[/i|,] >T-g'^^T{l + Q{5\ogTf). By Lemma[l9]and Lemma[2j this 



implies E[|/it|] > \/T{\ + Q{5\ogT)). These statements together imply both the 
guarantees we set out to prove about D' . 
It remains to prove the following lemma. 



Lemma 2. 



E 



T^Sy > C for all T where C is an absolute constant. 



Proof: Recall that for St drawn according to AOpt-FRW;_5(T), we have 
hiSr) = HAt/2) + KBt/2) + R- We already saw that E[/i|] > 2E[ft,^/2]- Let 

fT '■= ]gr^2 ]2 ■ We need to show that r^ < C. We have, 
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E[h^] E[h^] 



E[4]2 - 4E[/i2 ]2 



T/2J 



Now, let's write a recurrence for E[/i|n] 



= E[{hA + h{si) + R)^] 

= E[h\] + E[h(si)^] + E[r:^] + 6E[h\h(sif] + 6E[h\R^] 
6E[h{si)^R'^] + 4E[hAR^] + 4E[h\R] 
= 2E[h^/^] + 5^{T/2f + QE[hl/^f + l25^{Tl2)E[hli^]^ 

A5\T/2f^E[\hT/2\] + ^5^Tj2E[\hT,2?] 
< 2E[/i^/2] + 0{6'^T^) + 6E[/i^/2]^ + 0(S^TE[h'^j,/^])+ 
0((53T3/2)y^E[/i^/2] + 0{SVf)E[h^j,/2f^ 



Dividing both sides by AE[h'^/^]'^ and using the fact that E[hj.] > E[ft,|]2 > T^, 
we get:- 



fT < 



E[h^] 
AE[h^ 12 



T/2J 

< (1/2) . rT/2 + 0{5') + (3/2) + 0{5^) + 0(5^) + ©(J)^^ 

< (3/4) • rT./2 + 0(1) 

which is clearly bounded above by an absolute constant for all T. 
Thus, the theorem is proved. 



Theorem 20. The distribution D :— FRWi^^ is 0{d) -unpredictable in a weak 
sense i.e. Esj[As{I)] < 0{6) ■ ft-i/i where hn := E^^£)(„J|ft,(s)|]. Here s, I and A 
are as in Definition [71 Note that the expectation on the left is taken over I as 
well as the prefix s as opposed to Definition^where s is fixed and the expectation 
is over I only. 

Proof: [Sketch] 

It can be shown that the process FRWi^^ has very similar properties if in Step 
2 of the construction, instead of changing exactly 5 ■ h(si) bits in S2 we change 
each bit (of appropriate sign) in S2 with probability ^^'"•^ . Here we assume this 
fact without proving it. 
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We need to show that Es.i[As{I)] < 0{6) ■ hiji. We may assume that / is an 



aligned interval (Claim 11 1. Let s{i) be the suffix of length i in s. Then, 



E,[E,[A,(/)] 

4=0 i=log|7|+l 

<OiS)-E,[\hisi\I\m + 
0(5).E,[|/.K|/|))|](1/2V4 + 1/41/4 + ...) 

<0{S)-E^[\h{s{\I\m 
=0{5) ■ /i|,| 

where the second inequality uses hr = T^/^+^(^'> < 7^3/4 (xheoremM 



Claim 21 For n^2' -l, S ~AFRW;,5(n), 

hiS) = J2 r^^'^hiXu) (B.l) 

UC[,] 

where r = (1 + S) and each Xjj is independently and uniformly distributed in 
{-1,1}'- 

Proof: We will prove the claim by induction on i. For i = 0, the claim clearly 
holds. 

Assume that the claim holds for i = k, and let n := 2'"'"i • I. Let S" = 6*1 • 52 be 
the sequence produced by the distribution as described above where ^i and S'2 
are random sequences of length n/2 each. Because of step Augment, it is clear 
that h{S'2) = h{S2) + 5h{Si) which means h{S) = (1 + 5)h{Si) + h{S2). Thus, 



h(S)^rh{Si) + h{S2) 

= J2 r{r\^^h{Xu))+ ^ r(rl^l/i(Av)) 

UC[k] VC[k] 

UC[k] VC[k] 

= E (^""M^r/)) 

UC[k+l] 



Lemma 3. Median of \h{AFRWi,s)\ is n^+"^^'> . 
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Proof: In the notation of Theorem [28| we think of each term in Equation |B.1| 
as a random variable. There are exactly 2* terms. It is clear that E[X5] = for 
all S C \i]. Also, E[X|] = r^l'?! . I and E[|X||] = r^l^l • /, 



^2 .= ^^2|5| .l^(r^ + iy.l^(^2 + 2Sy ■ I « ni+« 



Also, maxs Ps/<^s = maxsrl'^l = r* sa n^^°> . Thus, we have 



|5-7V(0,a2)|<n-l/2.„OW<„-^^(l) 

Thus, the distribution of |/i(AFRWi,A-)| is very close to a half-normal dis- 
tribution with variance a-"^ and thus the median of |/i(AFRW;5)| is J7(cr) = 



Claim 22 The probability that step Augment is executed at a particular step 
is at most r~^°. 



Proof: This is a simple application of Theorem 27 

Let's say we are at the step where the length of the sequences is n := 2' • Z. We 
consider random variables Ys := r^S\ ■ Xs where Xs is as in Lemma ^ Observe 
that a single random variable Yg is actually a sum of I independent random 
variables each of which take values in {— rl'^l,rl'^l}. Let us denote these random 
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variables as ls,i so that Xs = J2i=i Ys,i- Thus, in the notation of Theorem 

S,i S,i S 

where a^ is as in Lemma [s] 

The step Augment is executed only when hs > n — 5n oi hs < —{n ~ 5n). 
Thus, we have the bound 



Vr[\hs\>n-5ri] <V,[\hs\> n/i] < 2.exp (^-^^^^ j <2.exp^-^^) < T" 
as desired. 



Claim 23 // distribution D(T) is e -unpredictable with respect to all aligned in- 
tervals then it is c ■ e-unpredictable with respect to all intervals, where c :~ % 

Proof: Consider an interval / of size x. If / is an aligned interval we are done, 
otherwise we write it as the minimal union of aligned intervals (take out the 
largest aligned interval in / and repeat). There are three possibilities:- 

1. / — /i U /2 is a union of two intervals of size xjl each (eg. the interval 

[r/4 + 1, 3r/4]) 
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2. / = /i U /2 U . . . U /fc , where each Ij is of a different size. Note that all interval 
sizes on the right are powers of 2 and strictly less than x 

3. I = J U J' where each J can be written as a union of intervals as in [T] or [2] 
above 

In the first case, 

\E[hi]\ < \E[hi,]\ + \E[hi,]\ <2-e-y^=V2-e-Vx 
In the second case, 

k 



In the third case, 

\E[hj]\ < \E[hj]\ + \E[h'j]\ < -^ -e-VlJl- 



e ■ vx 



^/2-\ ^ V2-1 



C Fractal nature of deterministic inverting sequences 

We will argue that the optimal sequence with height h and inversion ratio a is 
achieved by the following fractal-like recursive process. To construct a sequence 
of height /i, recursively generate a sequence S\ of height (1 + a/2) • h and si 
of height a ■ h respectively. Concatenate si, an inverted copy of S2 followed by 
another copy of si. For simplicity for explanation we will ignore rounding errors 
from the discretization. 

It turns out that for large h, the ratio of lengths of si and S2 is fixed to 
(ii^) : a^/^ where is a constant defined below. 

Claim. The above process produces an a-inverting sequence for a smaller than 
some constant. 

Proof: Observe that by recurrence any interval that is contained within si or S2 
is Q-inverting. The full interval consisting of the three concatenated strings also 
has an a-inversion; and so are the intervals that span the first two and the last 
two strings. So we only need to argue about intervals that span parts of multiple 
of these pieces. Consider for example an interval that spans across some sufRx 
of Si and some prefix of the inverted copy of S2- Now for small enough a, the 
two parts of the interval have heights of opposite signs. So the a-inversion in the 
piece with the larger absolute height suffices to produce an a-inversion in the 
interval. The same argument can be applied for intervals that span part of the 
first and the third sequence. ■ 
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Claim. Let s be an a-inverting sequence of length t (Definition ^ , where a is 
bounded above by a constant. Then the highest deviation that can be achieved 
by s for large t is t^ where 6 is the solution to the equation 1 — 2((l + a)/2)^/^ + 
(a)^/^. Furthermore, this deviation is actually achieved by the above process. 

Proof: [Sketch] We will compute the amount of time t{h) when the process 
described above first achieves a height h > 0. By the construction, t{h) satisfies 
the recurrence t{h) = 2t ■ ((f + a)h/2) + t{ah). In the limit, if this recurrence 
has a solution of the form /i^/^ then note that /i^/" = 2((1 + a)h/2y/^ + {ahY^^ 
which means that 1 = 2((1 + q;)/2)^/^ + (a)^/^. The proof can be formalized by 
sandwiching the solution to the recurrence in the limit between the functions 
h^'^^ and h^'^^ where 9i and 62 approach 6 from above and below. 

To prove the lower bound, let t{h) denote the required time to produce a 
height of absolute value h for any a-inverting sequence. We will prove that for 
large h, t{h) approaches h^^^ . We know that for large enough t there must be 
an inversion with ratio a. So to achieve height h in time t there must be a 
sub-interval with height less than —ah. So t can be broken into three segments 
of lengths ti, t2^ is with heights hi, h2, h^ such that h = hi + h2 + h^ where 
/i2 < —ah. We wish to minimize t{h) =^1+^2+^3 > t{hi) +t{h2) +t{h3). Since 
t{h) is non-decreasing in h, we may set /12 = —ah and hi + hs = h—h2 — {l+a)h 
giving t{h) = min i(ft-i) -t- i(/i2) + t{ah) where hi + hs = {1 + a)h. 

Note that if t{h) is of the form h^/^ then it is convex and so t{hi) -f t(/i2) 
is minimized when hi = h^ = ^ °^' giving t{h) = 2t{ ^ ^'' ) -\- t{ah) whose 
solution approaches h^^^ in the limit. That the solution must approach h^^^ , by 
looking at the behavior of logj h in the limit and sandwiching it between 9i and 
O2 that approach 9 from above and below. ■ 



D Miscellaneous Observations 

Observation 24 A uniform random, sequence is {a,q) inverting (DefinitionlSV 
for some constants a, q. Further the probability parameter q can be made as high 
as 1 — s by reducing the inversion ratio a to 6'(l/log(l/e)). 

Proof: Let us divide the interval of length x into two halves of length x/2 each. 
With probability 1/2 the two parts have opposite heights and with constant 
probability both heights have magnitude 0{^/x). Thus it has an a- inversion with 
some constant probability for some constant a. The higher probability statement 
is obtained similarly by dividing it into log(l/£) intervals of equal length. ■ 

Observation 25 // a string is sampled from the highest entropy distribution 
with deviation k\/T, then it is possible to get an expected payoff of fi{l) ■ kVT 
fork = J7(l). 

Proof: [Sketch] The algorithm simply predicts the sign of h{s) where s is the 
sequence seen in the first half i.e. |s| = T/2. A simple computation proves the 
observation. ■ 
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The following theorem shows that the FBM with i? = 1/2 + 5 is not 0{S)- 
unpredictable. In fact, an algorithm can get an expected payoff of 0{x^) on an 
interval of size x by predicting the sign of the height of the preceding interval 
of length X. (It can also be shown that one cannot do better than this if one is 
only allowed to use the sign of the height of some preceding interval.) 

Claim. The algorithm that predicts an interval of length x using the sign of the 
height of the preceding interval of length x gets an expected payoff of 0{x^) 
where the expectation is taken over all values in the preceding interval. Further 
it is optimal to use a preceding interval of length x if one is using the sign of its 
height. 

Proof. E[Bh{sx)\Bh{x)]/Bh{x) = {l/2){s^" + 1 - |s - 1\'^") (See [5], Section 
5.3) 

Let us compute the expected payoff if one uses the height of the preceding 
interval of length x to predict the following interval of length x. 

E[BHiis+l)x)\BHisx)]/BHisx) = (l/2)((l+l/s)2^+l-(l/s)2«). E[Bh{{s+ 
1)x)-Bh{sx)\Bh{sx)] = (l/2)((l + l/s)2^-l-(l/s)2^)Bff(sa;). So by predict- 
ing the sign of Bh{sx) to predict the following interval of length x the expected 
^&yoWis E[sign{BH{sx))BH{sx)] = {l/2){{\+\/ s)^" -l-{l/ sf")E[\BH{sx)\] = 
0(sx)^)(l/2)((l + l/sf" - 1 - [l/sf". 

Note that for s = 1, this is 0{x^). Further this is the best possible value of 
the above expression. 

Observation 26 With continuous prediction the FBM and its binary (discretized) 
variants have a payoff of f2{5T) 

Proof: Observe that if we take a sequence of length 2 the second bit is correlated 
to the first by 0{5). This is true of every even bit. The observation follows for 
the binary variants. For the true FBM the statement holds since if Bi and i?2 
are the heights in two adjacent unit intervals of the FBM process with hurst 
coefhcient H = 1/2 + 0(5) then 

E[Bi + B2 I Bi\/Bi = (l/2)(22» + 1 - l2ff ) = 2'S'('5) (See [5], Section 5.3) 

Therefore E[B2 \ Bi] = (2^^('') - I) ■ Bi = 0{5) ■ Bi for (5 < 1. So again 

by predicting the sign of Bi one can get a payoff of 0{5) ■ Bi ■ sign{Bi) = 

0{6)-\Bi\. This in expectation is 0{S) as Bi is normally distributed with constant 

variance. ■ 

Claim. For any random variable X that only takes non negative values and 
E[X^] = 0{{E[X]f, Pr[X > f2{E[X]) = f2{l) 

Proof: Let /z = -E[^]. The the standard deviation a = 0{n) = c^, (say) where 
c is at most some constant. We will bound _E[X|X > /i + rc^] for any r € N. 
Note that Pr[X> ^i + rcfi] <l/r^. 

So E[X\X > fj, + rcfi] < (/i + rc^) + /-i E,>r l/«^ < (m + rc^i) + c^/r. 

Now ^ = E[X] = Pr[X < n + rcfj]E[X\X < /i + rcfj] + Pr[X > /i + 
rcij]E[X\X > li + rcn] < {1 - l/r'^)E[X\X < /it + rc/i] + (l/r2)(^ + rc/Lt + c/i/r). 
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By setting r to be a constant that is at least some large multiple of c, we can 
conclude that i?[X|X < /^ + rcji] ~ J^(/i)- So this conditioned random variable 
X has maximum value and mean value that are the same upto constant factors. 
Thus it must exceed J7(/i) with constant probability. So the unconditioned ran- 
dom variable X must also exceed fi(/i) with a smaller constant probability. ■ 



E Basic tools 

Theorem 27 (Hoeffding's bound). 

Let Xi,X2, . . . , X„ he independent random variables such that ¥\Xi] = and 
Vy[X, e [a„h]] - 1- Let S :^ Y.^X^■ Then, 



Prll-S*! > y] < 2-exp' ^ 



Theorem 28 (Berry-Esseen Theorem). /^^/ 

Let Xi, X2, ■ . ■ , Xn be independent random variables such that E[Xi] = 0, 
E[Xf] = af > 0, and E[\Xf\] ^ p, < 00. Let a^ := J2i'^^ arid S := ^T,^X,. 
Then there is an absolute constant C such that 

\S-N{0,a^)\< --max^ 

Here \D—D'\ := max^; | Pr[£) > x]— Pr[£'' > x\\ denotes the statistical distance 
between distributions D and D' and N{iJ,a^) denotes the normal distribution 
with mean fi and variance a^ . 
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